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ABSTRACT 

A norm distribution consisting of test scores 
received by 810 college students on a 150 item dichotomously-scored 
four- alternative multiple-choice test was empirically estimated 
through several item-examinee sampling procedures. The post mortem 
item-sampling investigation was specifically designed to manipulate 
systematically the variables of number of subtests, number of items 
per subtest, and number of examinees responding to each subtest. 
Defining one observation as the score received by one examinee on one 
item, the results suggest that as the number of observations 
increases beyond 1.23 per cent of the data base all procedures 
produce stochastically equivalent results. The results of this 
investigation indicate that, in estimating a norm distribution by 
item-sampling, the variable of importance is not the item-sampling 
procedure per se but is instead the number of observations obtained 
by the procedure. It should be noted, however, < nat in this 
investigation the test score norm distribution was approximately 
symmetrical and the possibility should not be overlooked that 
item-sampling as a procedure may be robust for symmetrical norm 
distributions. (Author) 
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The negative hypargeometric distribution has been found to provide 
a reasonably good fit for a variety of test score distributions where 
the test score is the number of correct answers (Keats & Lord, 1962 ). 

The negative hypergeometric distribution is, within this context, a 
function of three parameters: the number of test items and estimates 

of the mean and Variance of the normative distribution. Operating 
within the framework of the item-sampling model. Lord (1960) has 
provided the appropriate equations for computing unbiased estimates 
of the first two moments of a frequency distribution and has, further- 
more, demonstrated (Lord, 1962) thap a norm distribution may be 
satisfactorily approximated by a negative hypergeometric distribution 
fitted to parameters estimated through item-sampling. The procedure 
Is as follows: 

1. The test items to be normed are divided into £ subtests and 
each subtest is administered to a different set of examinees. 

2. The results obtained from each subtest (item-examinee sample) 
provide an estimate of the mean M- and variance 2 of the norm 
distribution when formulas 9 and 10 in Lord (1962) are applied. 

A single estimate of p, is obtained by averaging the t_ estimates 
of p, obtained from each item-examinee sample; a single estimate 
of o i by ivcrAging the Jt estimates of the population variance. 
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3. Substituting each possible test score x into the negative 
hypergeometric function specified in equation 23.6.10 in 
Lord and Novick (1968) produces an estimate of the proportion 
of examinees in the norm population receiving that test 

i 

score. * 

Implementing the procedure outlined above produces many interesting 
questions: How many different subtests t of items and examinees are 

required to estimate satisfactorily the norm distribution? Is it 
more appropriate to administer a fewer number of subtests containing 
a larger number of items or a larger number of subtests containing 
fewer items? To how many examinees should each subtest be adminis- 
tered? Must all items in the test be distributed among the subtests? 

The project described herein was an attempt to provide tentative answers 
to questions such as these. 

Several investigations (e.g., Plumleo, 1964; Cahen ejt al., 1969; 
Owens & Stuff lebeam, 1969) have estimated parameters by item-sampling 
but only Cook and Stufflebeam (1967) have investigated the relative 
merits of different item-sampling procedures in estimating a norm 
distribution with the negative hypergeometric distribution. It should 
be mentioned that the expressed purpose of their study was that of 
contrasting two approaches — item sampling, given the condition of 
sampling without replacement, and examinee sampling — in estimating 
a norm distribution. Cook and Stufflebeam concluded that item sampling 
is equally effective to examinee sampling. 
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In the Cook and Stufflebeam (1967) design, the number of subtests 
is confounded with number of items per subtest and with number of 
examinees receiving each subtest. Using the Cook and Stufflebeam 
article as a point of departure, the present investigation was specif- 
ically designed to manipulate systematically the variables of number 
of subtests, number of Items per subtest, and number of examinees 
responding to each subtest to determine the relative merits of several 
item-sampling procedures which might be used in estimating a norm dis- 
tribution. 

METHOD 

The research design was one of a posteriori item-sampling: given 

<* 

a norm distribution, various item-examinee samples are selected at 
random from this data base and used to estimate the distribution from 
which they have been sampled. In this investigation the norm distri- 
bution consisted of test scores received by 810 college students on a 
150 item dichotomous ly-scored 4-ralternative multiple -choice test 
administered as a final examination in the Spring of 1969. On this 
examination the mean score M* was 87.390 with variance Q 2 of 324.193 
and Kuder -Richardson Formula 21 reliability equal to .893. 

The twenty item-sampling procedures used to estimate the norm 
distribution are listed in Table 1. as all procedures, with one 
exception, are similar only procedure 1 will be described in detail. 



Please insert Table 1 about here. 
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In procedure 1, the 150 test items were divided by randomly sampling 

without replacement into 10 subtests each containing 15 items. From 

the pool of 810 examinees , 10 groups of 10 examinees were selected 

at random and without replacement. Each subgroup was administered 

one subtest, that is, only those items in that subtest were scored for 

2 

those examinees. Procedure 1 produced 10 estimates of p, and . 

The pooled estimate of p, was found to be 87.111; the standard deviation 

of the 10 estimates of 4 was 14.867. The pooled estimate of a 2 was 

318.185 and the standard deviation of the 10 estimates was 263.033. 

Using these estimates of the parameters, the Kuder -Richardson Formula 

21 reliability coefficient for the full-length test was computed to be 

.891. The absolute value of the maximum difference D between the 

max 

cumulative relative negative hype rgeome trie distribution fitted to the 

estimates of 4 and Q 2 obtained from procedure 1 and the cumulative 

relative negative hyper geometric distribution fitted to p, and q 2 

was .038. D between each pair of distributions, the test statistic 
max 

for the Kolmogorov-Smirnov one-sample test for goodness-cf-fit (Siegel, 
1954), was selected from 150 differences. 

Procedures 1 through 4 are similar to the item-sampling procedures 
used by Cook and Stufflebeam (1967) with the exception that the number 
of examinees receiving each subtest has been held constant. Procedures 
5 through 8 are a replication of 1 through 4 with an increase in the 
number of examinees receiving p.ach subtest. In procedures 9 through 12 
the number of items per sub test and the number of examinees receiving 
each subtest have been held constant; in 13 through 16, the number of 
subtests and the number of examinees receiving each subtest have been 
held constant. In procedures 17 through 20 only the number of examinees 
receiving each subtest has been held constant . 
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Each set of four procedures was a systematic exploration of the 
Cook and Stufflebeam (1967) design. Certain procedures, i.e., 1 and 
9, 2 and 13, 10 and 14, 2 and 18, 12 and 20, are identical and were 

computed once; in each instance the results were recorded twice in 
Table 2. 



RESULTS AND DISCUSSION 

All results are recorded in Table 1. On the basis of the Kolmogorov- 

3 

Smirnov one -sample test , three procedures produced negative hypergeometric 
distributions which were judged not to be stochastically equivalent^ to 
the fitted norms distribution. In Procedures 1 through 4, with the number 
of examinees per subtest being held constant, all negative hypergeometric 
distributions were equivalent to the fitted norms distribution. While 
it is of theoretical interest to note that tha smallest value of D 

max. 

occurred with that item sampling procedure involving a large number 
of subtests with few items per subtest--with the converse being also 
true, the effect was nullified (procedures 5 through 8) by an increase 
in the number of examinees receiving each subtest. Procedures 9 through 
16 were designed to partial out the effect noted in procedures 1 through 
4. Holding the number of items per subtest and the number of examinees 
per subtest constant, an increase in the number of subtests produced 
• negative hypergeometric distribution more stochastically equivalent 
to the fitted norms distribution. Similar results were obtained 
(procedures 13 through 16) with an increase in the number of items 
per subtest, holding constant the number of subtests and number of 
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examinees per subtest. The results from procedures 17 through 20 
suggest that beyond a certain point little is to be gained by simul- 
taneously increasing the number of subtests and the number of items 
per subtest. 

The inconsistencies found in Table 1 (e. g. , procedures 17 end 20 

producing negative hypergeometric distributions equivalent to the 

fitted norms distribution) are made less alarming if D max per procedure 

is analyzed as a function of the number of observations (one observation 

is equal to the score received by one examinee on one item). For small 

numbers of observations the values of D are variable and inconsistent 

max 

however, as the number of observations increases beyond a certain point, 
all procedures produce equivalent results. That certain point in this 
investigation was approximately 1.237c of the norm data base. It is 
not surprising, therefore, that Lord (1962) and Plumlee (1964) obtained 
a good approximation with an item-sampling procedure involving 107» of 
the total observations and that similar results were obtained by Cook 
and Stufflebeam (1967) with procedures involving percentages of total 
observations ranging from 9.18 to 49.45. j 
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Item sampling procedures with results 
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♦Negative hypergeometric distribution not stochastically equivalent to fitted norms distribution. 



FOOTNOTES 



In the computer program used for calculating values of this proportion, 
1/2 was added to a and b as defined in Lord and Novick (1968). Each 
term was truncated before substitution into equation 23.6.10. A copy 
of the Fortran program with documentation may be obtained upon request 
from the author. 

2 

The exception to this general pattern was found in procedure 17. Each 
subtest was formed by randomly sampling without replacement 40 items from 
the 150 item pool. It was, therefore, possible for a particular item to 
appear in more than one subtest. Only 2 of the 150 items were not selected 
for inclusion in any subtest. 

3 

A referee has pointed out, and correctly so, that the Kolmogorov- Smirnov 
tests should be viewed as providing rough indications rather than strict 
significance tests. Since the population was finite and since the 
sampling was done without replacement, there is necessarily a closer 
agreement between sample and population than there would be in random 
sampling from an infinite population. 

4 

Two distributions are said to be stochastically equivalent if the two 
distributions are distinct and if f(x) = g(x) for all x. 
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Figure 1. Norms group frequency distribution and negative hy pergeometr i c distribution 
fitted to the parameters. (Frequency polygons were used for graphic clarity 
Both distr ibut ions represent discrete, not continuous, variables.) 



